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In Hybrid Monte Carlo(HMC) simulations for full QCD, the gauge fields evolve smoothly as a function of 
Molecular Dynamics (MD) time. Thus we investigate improved methods of estimating the trial solutions to the 
Dirac propagator as superpositions of the solutions in the recent past. So far our best extrapolation method reduces 
the number of Conjugate Gradient iterations per unit MD time by about a factor of 4. Further improvements 
should be forthcoming as we further exploit the information of past trajectories. 
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1. INTRODUCTION 

The inclusion of internal fermion loops in the 
vacuum of QCD is a major challenge. The present 
state of the art for generating full QCD configu- 
rations is the so called Hybrid Monte Carlo algo- 
rithm which uses Molecular Dynamic evolution in 
a "fifth time" coordinate t. The Hamiltonian for 
this evolution is 



5= iTr + Sg{U)+^^ 
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where Pfi{x) are the angular momenta conjugate 
to the gauge fields U^i{x), Sg{U) is the pure gauge 
action, M{U) is the Dirac matrix and Lp is the 
pseudofermion field. In our discussion the precise 
form of the gauge action is not important. What 
is relevant is the need to accurately integrate the 
equations of motion, calculating the force on U 
due to the pseudofermions at each time step. This 
requires solving, over and over again, the linear 
equation. 



A{t) x{t) = ^, 



(2) 



where A{t) = M{U)Hl{U) and x{t) is the solu- 
tion of the inverted Dirac operator. Technically 
this is achieved by starting with a trial value Xtriai 
and iteratively solving Eq. ^ for x{i)- On the or- 
der of 100 MD steps are taken holding the pseud- 
ofermion source fix, so that the operator A(t) 
changes smoothly as a function of the MD time 
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t, as new values of U are generated. These it- 
erations, usually done by the conjugate gradient 
(CG) method, are the most computational expen- 
sive part of Hybrid Monte Carlo algorithms. 

This raises an obvious question. As we move 
in MD time, why are we not able to "learn" from 
the recent past enough about the space of likely 
solutions to vastly improve our iterative scheme? 

One should be able to give a very good esti- 
mate, Xtriai before starting the conjugate gradi- 
ent routine. A crucial ingredient in this approach, 
is the fact that detail balance is in principle pre- 
served independent of the starting trial configu- 
ration so long as one converges accurately to the 
solution. Therefore, we propose to estimate care- 
fully the starting configuration [Q. To accomplish 
this some information on the configurations in the 
previous MD steps have to be stored. Although 
these algorithms will use more memory, memory 
is often not a severe constraint in modern super 
computer simulations. 



2. ANALYTIC EXTRAPOLATIONS 

To motivate our extrapolation methods, con- 
sider the function as an analytic function of 
t. For simplicity of notation suppose we want the 
value at t = 0, given past values at ti, i2, is, ■ ■ ■. 
In practice this is usually a regular series of values 
ti = ~i e with a integrations step of size dt — e. 
Then a trial value for the the solution, x(0) of the 
new Dirac matrix A{Q), might be considered as a 
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linear superposition of old solutions, 

Xtriai = ci xih) + C2 x{h) H h cjv x(ijv). (3) 

If A^e is sufficiently small, wc may Taylor expand 
each term around t = and determine the coef- 
ficients by canceling all terms for e'^ to O(e^). 
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(4) 



As we will demonstrate this procedure is equiva- 
lent to the familiar N-1 order polynomial fit to N 
points. 

2.1. Polynomial Extrapolation 

Good result can be obtained even with a si mple 
polynomial fit of degree — 1 . To estimate the 
configuration it is necessary to store in memory 
the previous (A^-|-l) configurations. However the 
polynomial extrapolation does not require signif- 
icant computational effort, it is just a local sum 
on each lattice point with fixed coefficient, that 
is less than a single CG step. The Xtriai is ex- 
pressed as a polynomial, Xtnai = 2/1+^2/2 + 
• ■ • -I- t^~^ y^, whose coefficient satisfy the con- 
straint, 
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One can easily prove that Eq. |4| and Eq. || define 
identical extrapolation (j/i = J2i c-iX{U) = Xtriai- 
For equally spaced time steps ti = —i e, the coef- 
ficients given by 
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Table ^ shows the results of simulations for poly- 
nomial extrapolation. The number of CG steps 
needed to reach convergence is plotted in func- 
tion of the degree of extrapolation N, and the 
MD time step e. The case = 1 corresponds to 
starting with the old solution. 

2.2. Minimum Residual Extrapolation 

An alternate, perhaps more appealing, ap- 
proach is to consider the past history of solu- 
tions, xi^i) defining an important linear sub- 
space for seeking an optimal trial solution. Since 



the Conjugate Gradient method is in fact just a 
minimal residual technique confined to the Krylov 
subspace spanned by vectors A^~^xtriah why not 
start by looking at a "smarter" subspace based 
on past success for nearby times? 

In this spirit, we suggest minimizing the norm 
of the residual, 



(7) 



in the subspace spanned by Xi = x(^j)j where 
r ^ b — Mx and b = M'^ip. The minimization 
condition reduces to 
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The only problem is that this system can be ill 
conditioned because the past solutions Xi differ 
from each other by order e. Nevertheless if we 
only want to get the minimum of rV in span{xi) 
using a Gram Schmitt orthonormalization, we can 
solve the system avoiding the singularities. 

This method requires {N^ + 5N) /2 dots prod- 
uct and (N) Mx matrix-vector applications, and 
the storage of (N) past configuration. It is inter- 
esting to note that this method gives coefficients 
that for the first few order reproduce very close to 
the polynomial extrapolation. Table 2 show the 
number of CG steps using this method. 

3. CONCLUSIONS 

To compare efficiencies, the CG iterations 
should be divided by e, so we compare the total 
number of CG steps needed to evolve the system 
for a fixed distance in configuration space. Note 
that if e is too large, the overall performance is 
good, but the acceptance will drop drastically. If 
e is too small, the extrapolation is excellent, but 
the system will evolve too slowly in phase space. 
It is not trivial that we find a window in e where 
both the acceptance is good and the extrapolation 
works well. Moreover this window has parameters 
close to those used in actual simulations. 

Our present approach is clearly not the only 
one worth considering. In fact we have empha- 
sized the analytic properties of x(^) because it 
suggests ways to understand and further improve 
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Table 1 



CG steps using polynomial extrapolation 



N=l (previous data) 


0.98 


0.93 


0.93 


0.88 


0.85 


0.77 


0.63 


N=2 (I'** order extrap.) 


0.92 


1.07 


0.86 


0.74 


0.72 


0.62 


0.27 


N=3 (2"^^ order extrap.) 


0.91 


0.82 


0.74 


0.61 


0.59 


0.47 


0.21 


N=4 (3*'* order extrap.) 


0.96 


0.77 


0.56 


0.44 


0.41 


0.31 


0.21 


N=5 (4*'' order extrap.) 


0.86 


0.77 


0.54 


0.45 


0.42 


0.35 


0.26 


N=6 (5*'' order extrap.) 


0.70 


0.48 


0.39 


0.38 


0.40 


0.40 


0.41 


N=7 {6*^ order extrap.) 


0.70 


0.59 


0.40 


0.42 


0.42 


0.39 


0.46 


St = 10-^* 


15 


10 


9 


8 


7 


5 
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Number of CG steeps to reach the solution (residue < lO^^'^), normalized to 1 for x = 0. Full QCD 
configurations on 16^* lattice with Wilson fcrmions k = 0.157 and (3 — 5.6. Average on 30 configurations, 
statistical errors are of the order of 6%. 



Table 2 



CG steps using minimum residual extrapolation 



N=l 


0.99 


0.93 


0.92 


0.87 


0.84 


0.81 


0.64 


N=2 


0.87 


0.72 


0.71 


0.69 


0.63 


0.49 


0.25 


N=3 


0.85 


0.64 


0.57 


0.52 


0.49 


0.32 


0.06 


N=4 


0.72 


0.49 


0.41 


0.34 


0.28 


0.16 


0.10 


N=5 


0.72 


0.37 


0.31 


0.23 


0.19 


0.12 


0.07 


N=6 


0.51 


0.28 


0.26 


0.21 


0.16 


0.13 


0.06 


N=7 


0.49 


0.25 


0.24 


0.21 


0.19 


0.12 


0.06 


N=8 


0.40 


0.26 


0.23 


0.18 


0.15 


0.09 


0.06 


N=9 


0.38 


0.23 


0.19 


0.16 


0.13 


0.09 


0.06 


N=10 


0.39 


0.22 


0.19 


0.14 


0.12 


0.10 


0.04 


N=ll 


0.38 


0.18 


0.17 


0.14 


0.11 


0.08 


0.04 


St = 10-^* 


15 


10 


9 


8 


7 


5 


2 



Number of CG steeps to reach the solution (residue < 10 ^■^), normalized to 1 for x = 0. Same 



configurations of Table 1. Statistical errors are of the order of 4%. 



our approach. For example the failure at fixed e to 
improve the polynomial extrapolation by increas- 
ing indefinitely the number of terms is probably 
a signal of nearby singularities in t. Our success 
so far is probably due to low eigenvalues of the 
Dirac operator changing more slowly with time. 

Many other ways of using the past to avoid 
needless repetition can be imagined||^. For exam- 
ple the CG routine itself generates vectors, that 
may be useful than solutions in the more distant 
past. A Karman filter can be used top introduce 
exponential decreasing information from the past 
without increasing the storage requirement. The 
subspace of past vectors might be used not just 
as a way to arrive at an initial guess, but also 
as a way to precondition the iterative process it- 



self. For now, we are pleased for now that even a 
few vectors from the past can be combined with 
simple extrapolation ideas to give a very useful 
acceleration method. 
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